# README
## Replication Package for "Structural Estimation Under Misspecification: Theory and Implications for Practice"
Andrews, Barahona, Gentzkow, Rambachan, and Shapiro

## Overview

The code in this replication package constructs the figures, tables, and scalar values found in our paper using R, Python, and LyX.

The process is split across three stages, each with a dedicated directory:
- `/dropbox/` houses raw data.
- `/analysis_cluster/` directory houses code used for running simulations and estimation on cluster servers. 
- `/analysis_local/` directory houses code used for producing the plots and values displayed in figures, tables, and scalars in the paper. 
- `/paper_slides/` directory houses code for filling the figures and tables with the associated plots and values and ultimately producing the final draft of the paper.

Other directories:
- `/setup/` directory houses important files for configuring the repository. 
- `/lib/` directory houses shared functions utilized throughout the repository for running R and Python scripts. 

The replication package can be run by following the instructions in the `Instructions to Replicators` section of this README. 

## Data Availability and Provenance Statements

### Statement about Rights
We certify that the authors of the manuscript have legitimate access to and permission to use the data used in this manuscript.

### Summary of Availability
Some data **cannot be made** publicly available.

### Summary Table
|Data Name|Data Files|Location|Provided|Citation|
|---------|----------|--------|--------|--------|
|MW data|`blp_beer_revised_final.mat`; `blp_iv_revised_final.mat`; `demos.csv`; `demosSum.csv`|dropbox/|FALSE|Miller and Weinberg 2017|

### Details on Each Data Source
The data for our study come from Miller and Weinberg (MW 2017).
* The files `demos.csv` and `demosSum.csv` can be obtained directly from MW's replication archive.
  * In `demos.csv`, keep only the columns `year`, `market`, and `realInc*`
* The files `blp_beer_revised_final.mat` and `blp_iv_revised_final.mat` can be obtained by following the instructions in MW's replication archive to obtain the original proprietary data, and then executing the code in their replication archive.

## Computational requirements
All requirements must be installed and set up for command line usage. For further detail, see the **Command Line Usage** section below.

We manage Python and R installations using conda or miniconda. 
To build the repository as-is, the following applications are additionally required:
* LyX 2.4.1
* R 4.2.3
* Python 3.10.12
* Matlab R2022b

These software are used by the scripts contained in the repository in the `setup` folder. Instructions to set up the environment are found below in the section `Local replication`. 

### Software Requirements
The file `setup/conda_env.yaml` will install all the R and Python dependencies. Please refer to the section `Instructions to Replicators` for detailed steps on how to install the required environment and run the scripts. 
Below we list the softwares and packages required to run the repository with the version used.

  - Python 3.10.12
    - The file `setup/conda_env.yaml` lists the required python packages with version numbers.
  - R 4.2.3
    - The file `setup/conda_env.yaml` lists the required R packages with version numbers.
  - LyX 2.4.1
    - We last compiled the draft with MacTex2022

### Controlled Randomness

1. The program `/analysis_cluster/code/Sim/Sim.m` sets a random seed in line 21.

### Memory and Runtime Requirements

#### Summary

Approximate time needed to reproduce the analyses on a standard (CURRENT YEAR) desktop machine:

- [ ] <10 minutes
- [ ] 10-60 minutes
- [ ] 1-2 hours
- [ ] 2-8 hours
- [ ] 8-24 hours
- [ ] 1-3 days
- [ ] 3-14 days
- [ ] \>14 days
- [X] Not feasible to run on a desktop machine, as described below.

Full replication with all source data requires a cluster.

#### Details

The `analysis_cluster` code was last run on **Intel server with about 400 CPUs and 20GB of memory per CPU**. The `analysis_cluster` code computation time was about 36 hours. Most computation time is spent data simulation and model estimation. 

## Description of programs/code
In this replication archive: 

- The folder `/dropbox/` contains raw data and documentation. We exclude data and documentation for which we do not have permission to post. Each folder contains a README that describes the original source of the data, using links that were active at the time of acquisition.  

- The folder `/analysis_cluster/` contains code used for running simulations and estimation. This code should be executed on a cluster. `submit_jobs.py` governs the submission of jobs on the cluster server sequentially, following this pipeline: `Replicate.sh` → `Reestimate.sh` → `Sim.sh` → multiple shell scripts that call `EstimateUnified.m` for different models and IVs → `OutputResults.sh` → `ExportCSV.sh`. These scripts output an `ExportCSVdraft.csv` file, which is stored under the path `analysis_cluster/output`. This file serves as an intermediate data file for generating figures and scalars in the paper.
  * In the event that some jobs do not finish, e.g., due to cluster timeouts, `run_missing_jobs.py` can be used to execute these, after which `OutputResults.sh` → `ExportCSV.sh` can be run manually to update the output.

- The folder `/analysis_local/` contains code used for generating the figures and scalars in the paper. This code can be executed using local computing. The programs in the `/analysis/` folder can be run using the contained `make.py`, sequentially
    * `code/pull_estimates.R`
    * `code/plot_sim.R`
    * `code/plot_isobias.R`

- The folder `/paper_slides/` contains all the input and files necessary to the compiling of the paper. The subfolder `/paper_slides/figures/` contains lyx files for each figure. The subfolder `/paper_slides/tables/` contains lyx files for each table. The subfolder `paper_slides/code/`contains the paper Lyx file. 
  * `code/includedIV.lyx`
  * `figures/simulation_baseline.lyx` (Figure 2)
  * `figures/simulation_coarse.lyx` (Figure 3)
  * `figures/simulation_product.lyx` (Figure 4)
  * `figures/simulation_nonlinear.lyx` (Figure 5)
  * `figures/full_dag.lyx` (Appendix Figure 1)
  * `figures/isobias.lyx` (Appendix Figure 2)
  * `figures/simulation_mae.lyx` (Appendix Figure 3)
- The folder `/lib/` contains auxiliary functions and helpers.
- The folder `setup` contains files to setup the conda environment as well as to install the R and Python dependencies. 

## Instructions to Replicators

### Replication process
#### Setup 

1. Check that `config_user.yaml` is located in the root of the unzipped replication archive, alongside this readme. This file can be copied from the `setup` directory. If necessary, edit the external paths within. See the **User Configuration** section below for further detail.  

2. If you already have conda setup on your local machine, feel free to skip this step. If not, this will install a lightweight version of conda that will not interfere with your current python and R installations.


   Install miniconda and jdk to be used to manage the R/Python virtual environment, if you have not already done this. You can install these programs from their websites [here for miniconda](https://docs.conda.io/en/latest/miniconda.html) and [here for jdk](https://www.oracle.com/java/technologies/javase-downloads.html). If you use homebrew (which can be download [here](https://brew.sh/)) these two programs can be downloaded as follows:
      ```
      brew install --cask miniconda
      brew install --cask oracle-jdk
      ```
   Once you have done this you need to initialize conda by running the following lines and restarting your terminal:
      ```
      conda config --set auto_activate_base false
      conda init $(echo $0 | cut -d'-' -f 2)
      ```

3. Create conda environment with the command:
      ```
      conda config --set channel_priority strict
      conda env create -f setup/conda_env.yaml
      ```
   
   To activate the conda virtual environment just created, run
      ```
      conda activate blp-instruments
      ```
   The environment should be active throughout setup, and whenever executing modules within the project in the future. If you wish to return to your base environment, you can deactivate the conda environment with
      ```
      conda deactivate
      ```


4. Run the `/setup/check_setup_for_replication.py` file. One way to do this is to run the following bash command in a terminal from the root of the replication archive:
   ```
   cd setup && python check_setup_for_replication.py && cd ..
   ```

#### Build

1. Follow the *Setup* instructions above.

2. From the root of the replication archive, run the following command in a bash terminal to reproduce all figures. This can be done using local computing.
   ```
   python run_local_for_replication.py
   ```
   If all proprietary data have been obtained, instead run the following command in a bash terminal to fully reproduce all output. This should be done on a cluster.
   ```
   python run_all_for_replication.py
   ```

### Command Line Usage

For specific instructions on how to set up command line usage for an application, refer to the [Gentzkow template wiki](https://github.com/gentzkow/template/wiki/Command-Line-Usage).

By default, the repository assumes the following executable names for the following applications:

```
application : executable
python      : python
git-lfs     : git-lfs
lyx         : lyx
r           : Rscript
```

Default executable names can be updated in `config_user.yaml`. For further detail, see the **User Configuration** section below.

### User Configuration
`config_user.yaml` contains settings and metadata such as local paths that are specific to an individual user. For this repository, this includes local paths to [external dependencies](https://github.com/gentzkow/template/wiki/External-Dependencies) as well as executable names for locally installed software.

Required applications may be set up for command line usage on your computer with a different executable name from the default. If so, specify the correct executable name in `config_user.yaml`. This configuration step is explained further in `config_user.yaml` and the [repo wiki](https://github.com/gentzkow/template/wiki/Repository-Structure#Configuration-Files).

### Windows Differences

If you are using Windows, you may need to run certain bash commands in administrator mode due to permission errors. To do so, open your terminal by right clicking and selecting `Run as administrator`. To set administrator mode on permanently, refer to the [repo wiki](https://github.com/gentzkow/template/wiki/Repository-Usage#Administrator-Mode).

The executable names are likely to differ on your computer if you are using Windows. Executable names for Windows will typically look like the following:

```
application : executable
python      : python
git-lfs     : git-lfs
lyx         : LyX#.# (where #.# refers to the version number)
r           : Rscript
```

## List of tables and programs
The provided code reproduces:

- [ ] All numbers provided in text in the paper
- [ ] All tables and figures in the paper
- [X] Selected tables and figures in the paper, as explained and justified below.

  * `figures/simulation_baseline.lyx` (Figure 2)
  * `figures/simulation_coarse.lyx` (Figure 3)
  * `figures/simulation_product.lyx` (Figure 4)
  * `figures/simulation_nonlinear.lyx` (Figure 5)
  * `figures/full_dag.lyx` (Appendix Figure 1)
  * `figures/isobias.lyx` (Appendix Figure 2)
  * `figures/simulation_mae.lyx` (Appendix Figure 3)

| Figure/Table #     |LyX Program (paper_slides/)              | Program (analysis/)      | Line # | Program Output file (analysis/)  | Note                            |
|--------------------|-----------------------------------------|--------------------------|--------|----------------------------------|---------------------------------|
| Figure 1           |n.a.                   | n.a.                     | --- |n.a.                              | ... |
| Figure 2           |figures/simulation_baseline.lyx                   | analysis_local/code/plot_sim.R |34, 65| analysis_local/output/plots/medbias_logit_varying_X.pdf, analysis_local/output/plots/medbias_logit_varying_D.pdf| ...|
| Figure 3           |figures/simulation_coarse.lyx                   | analysis_local/code/plot_sim.R |50, 81, 113| analysis_local/output/plots/medbias_logit_varying_X_partial_resid.pdf, analysis_local/output/plots/medbias_logit_varying_D_partial_resid.pdf, analysis_local/output/plots/medbias_dropmonth_varying_D.pdf| ...|
| Figure 4           |figures/simulation_product.lyx                   | analysis_local/code/plot_sim.R |128, 144, 160| analysis_local/output/plots/medbias_logit_varying_X_prod_resd.pdf, analysis_local/output/plots/medbias_logit_varying_D_prod_resd.pdf, analysis_local/output/plots/medbias_logit_varying_D_prod_resd_sh.pdf| ...|
| Figure 5           |figures/simulation_nonlinear.lyx                   | analysis_local/code/plot_sim.R |179, 195| analysis_local/output/plots/medbias_rcl23_varying_X.pdf, analysis_local/output/plots/medbias_rcl23_varying_D.pdf| ...|
| Figure 5           |figures/simulation_nonlinear.lyx                   | analysis_local/code/plot_sim.R |179, 195| analysis_local/output/plots/medbias_rcl23_varying_X.pdf, analysis_local/output/plots/medbias_rcl23_varying_D.pdf| ...|
| Online Appendix Figure 1           |figures/full_dag.lyx                   | n.a.  | --- | n.a. | ...|
| Online Appendix Figure 2           |figures/isobias.lyx                   | analysis_local/code/plot_isobias.R |50| analysis_local/output/plots/isobias.pdf| ...|
| Online Appendix Figure 3          |figures/simulation_mae.lyx                   | analysis_local/code/plot_sim.R |218, 234| analysis_local/output/plots/mae_logit_varying_X_partial_resid.pdf, analysis_local/output/plots/mae_logit_varying_D_partial_resid.pdf| ...|



## References

Nathan H Miller and Matthew C Weinberg. Understanding the price effects of the MillerCoors joint venture. Econometrica, 85(6):1763–1791, 2017. [DOI](https://doi.org/10.3982/ECTA13333). [Link to replication archive](https://www.econometricsociety.org/publications/econometrica/2017/11/01/understanding-price-effects-millercoors-joint-venture/supp/13333_Data_and_Programs.zip) as of March 2025.
